Design of Vietnamese Speech Corpus and Current Status

نویسندگان

  • Luong Chi Mai
  • Dang Ngoc Duc
چکیده

This paper presents a current status and activities for spoken language resources for Vietnamese implemented in research institutions such as Institute of Information Technology, Vietnamese Academy of Science and Technology, and International Research Center MICA, Hanoi University of Technology. This is our first attempt of a process of building a large Vietnamese speech database and the corpora should be in a common design to make it available for researchers in Vietnamese speech processing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fujisaki model based F0 contours in vietnamese TTS

The current paper presents preliminary work towards the integration of the Fujisaki model into the VnVoice Vietnamese TTS system, based on a set of rules to control the F0 contour. A speech corpus consisting of 20 sentences was compiled. Each of the sentences can have various meanings depending on the tone associated with a monosyllabic keyword which it contains. The corpus with a total of 46 s...

متن کامل

Statistical Analysis of Vietnamese Dialect Corpus and Dialect Identification Experiments

The performance of speech recognition systems will be improved if the corpus is organized in the specialized domain and is applied in a consistent way for speech recognition in specific situations. Vietnamese dialects are various. The building of corpus for Vietnamese dialect is the first step for implementing the system of dialect identification used for increasing the performance of Vietnames...

متن کامل

Optimization on Vietnamese large vocabulary speech recognition

This paper summarizes our latest efforts toward a large vocabulary speech recognition system for Vietnamese. We describe the Vietnamese text and speech database which we collected as part of our GlobalPhone corpus. Based on these data we improve our initial Vietnamese recognition system [1] by applying various state-of-the art techniques such as semi-tied covariance and discriminative training....

متن کامل

First steps in building a large vocabulary continuous speech recognition system for Vietnamese

This paper presents an overview of our activities for building a Large Vocabulary Continuous Speech Recognition (LVCSR) system for Vietnamese implemented at CLIPS-IMAG Laboratory (France) and International Research Center MICA (Vietnam). Firstly, a new methodology for fast text corpora acquisition for minority languages which has been applied to Vietnamese is proposed. Secondly, the first resul...

متن کامل

Towards a Multi-Objective Corpus for Vietnamese Language

Today, corpus plays an important role in development and evaluation language and speech technologies, such as part of speech tagging, parsing, word sense disambiguation, text categorization, named entity classification, information extraction, question answering, structure discovery (clustering), speech recognition and machine translation systems, etc. One can exploit valuable statistical param...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006